- Introduction to COVID-19 World Vaccine Adverse Reactions Dataset
- Project work flow
- Project methods: important packages and verbs used
- Challenges and solutions - Load, Clean and Augment
- Visualizations
- Modeling
- Conclusion and discussion
May 10, 2021
PATIENTS.CSV: Contains information about the individuals that received the vaccines
## # A tibble: 3 x 35 ## VAERS_ID RECVDATE STATE AGE_YRS CAGE_YR CAGE_MO SEX RPT_DATE SYMPTOM_TEXT ## <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <date> <chr> ## 1 0916600 01/01/20… TX 33 33 NA F NA "Right side… ## 2 0916601 01/01/20… CA 73 73 NA F NA "Approximat… ## 3 0916602 01/01/20… WA 23 23 NA F NA "About 15 m… ## # … with 26 more variables: DIED <chr>, DATEDIED <chr>, L_THREAT <chr>, ## # ER_VISIT <chr>, HOSPITAL <chr>, HOSPDAYS <dbl>, X_STAY <chr>, ## # DISABLE <chr>, RECOVD <chr>, VAX_DATE <chr>, ONSET_DATE <chr>, ## # NUMDAYS <dbl>, LAB_DATA <chr>, V_ADMINBY <chr>, V_FUNDBY <chr>, ## # OTHER_MEDS <chr>, CUR_ILL <chr>, HISTORY <chr>, PRIOR_VAX <chr>, ## # SPLTTYPE <chr>, FORM_VERS <dbl>, TODAYS_DATE <chr>, BIRTH_DEFECT <chr>, ## # OFC_VISIT <chr>, ER_ED_VISIT <chr>, ALLERGIES <chr>
VACCINES.CSV: Contains information about the received vaccine
## # A tibble: 3 x 8 ## VAERS_ID VAX_TYPE VAX_MANU VAX_LOT VAX_DOSE_SERIES VAX_ROUTE VAX_SITE VAX_NAME ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 0916600 COVID19 "MODERN… 037K20A 1 IM LA COVID19… ## 2 0916601 COVID19 "MODERN… 025L20A 1 IM RA COVID19… ## 3 0916602 COVID19 "PFIZER… EL1284 1 IM LA COVID19…
SYMPTOMS.CSV: Contains information about the symptoms experiences after vaccination
## # A tibble: 3 x 11 ## VAERS_ID SYMPTOM1 SYMPTOMVERSION1 SYMPTOM2 SYMPTOMVERSION2 SYMPTOM3 ## <chr> <chr> <dbl> <chr> <dbl> <chr> ## 1 0916600 Dysphagia 23.1 Epiglotti… 23.1 <NA> ## 2 0916601 Anxiety 23.1 Dyspnoea 23.1 <NA> ## 3 0916602 Chest discom… 23.1 Dysphagia 23.1 Pain in ext… ## # … with 5 more variables: SYMPTOMVERSION3 <dbl>, SYMPTOM4 <chr>, ## # SYMPTOMVERSION4 <dbl>, SYMPTOM5 <chr>, SYMPTOMVERSION5 <dbl>
Load and clean
Augment
Visualizations and modeling
Important verbs and tools used:
CHALLENGE: Multiple large files
SOLUTION: Keep them compressed and only decompress when reading into R:
CHALLENGE: Wrong column types automatically assigned by R
## Warning: 241 parsing failures. ## row col expected actual file ## 1465 BIRTH_DEFECT 1/0/T/F/TRUE/FALSE Y <connection> ## 2742 X_STAY 1/0/T/F/TRUE/FALSE Y <connection> ## 2807 RPT_DATE 1/0/T/F/TRUE/FALSE 2021-01-04 <connection> ## 2807 V_FUNDBY 1/0/T/F/TRUE/FALSE OTH <connection> ## 2811 RPT_DATE 1/0/T/F/TRUE/FALSE 2021-01-04 <connection> ## .... ............ .................. .......... ............ ## See problems(...) for more details.
SOLUTION: Manually assign column types
CHALLENGE: NA strings (“NA”, “N/A”, “Unknown”, " "…)
SOLUTION:
I am aware of how horrible this table is :/
| CHALLENGE | SOLUTION |
|---|---|
| Unwanted columns | select(-c()) |
| NAs that should be interpreted as “no” | replace_na() |
| Row duplications | distinct() |
| Individuals who got more than one vaccine type (generates noise) | add_count(VAERS_ID) %>% filter(n==1) %>% select(-n) |
CHALLENGE: Some columns contain long string descriptions that need to be turned into something tidy
SOLUTION: Make categorical variable
Example: ALLERGIES column:
Make categorical variable that states if patient has allergies or not:
Clean categorical HAS_ALLERGIES column:
## # A tibble: 5 x 3 ## VAERS_ID ALLERGIES HAS_ALLERGIES ## <chr> <chr> <chr> ## 1 0916603 Diclofenac, novacaine, lidocaine, pickles, tomatoes, m… Y ## 2 0916604 <NA> N ## 3 0916660 Penicillin Y ## 4 0916685 none that I am aware of N ## 5 0917437 No known allergies N
Another example: OTHER_MEDS column
Detect individuals that have taken anti-inflammatory or steroid drugs before vaccine (not recommended):
Clean, categorial TAKES_ANTIINFLAMMATORY and TAKES_STEROID columns:
## # A tibble: 4 x 4 ## VAERS_ID OTHER_MEDS TAKES_ANTIINFLAM… TAKES_STEROIDS ## <chr> <chr> <chr> <chr> ## 1 0918421 1 aspirin a day 81 mg, levothyroxin… Y N ## 2 0921732 Ibuprofen - PRN States she does no… Y N ## 3 0932980 Hydrocortisone 25mg daily. Fludroc… N Y ## 4 0934539 Singulair, Oxybutynin, Fosamax, Pre… N Y
CHALLENGE: Symptoms are recorded in a way that makes later analysis difficult
## # A tibble: 5 x 6 ## VAERS_ID SYMPTOM1 SYMPTOM2 SYMPTOM3 SYMPTOM4 SYMPTOM5 ## <chr> <chr> <chr> <chr> <chr> <chr> ## 1 0916618 Injection site pa… Pain <NA> <NA> <NA> ## 2 0916619 Injection site pa… Menorrhagia <NA> <NA> <NA> ## 3 0916620 Arthralgia Chills Headache Mobility decrea… Myalgia ## 4 0916620 Nausea Pain in extrem… Pyrexia <NA> <NA> ## 5 0916621 Chills Fatigue Headache Myalgia <NA>
SOLUTION: 20 most common symptoms are found and turned into TRUE/FALSE columns
## # A tibble: 3 x 21 ## VAERS_ID HEADACHE PYREXIA CHILLS FATIGUE PAIN PAIN_IN_EXTREMITY NAUSEA ## <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> ## 1 0916600 FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## 2 0916601 FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## 3 0916602 FALSE FALSE FALSE FALSE FALSE TRUE FALSE ## # … with 13 more variables: DIZZINESS <lgl>, MYALGIA <lgl>, ## # INJECTION_SITE_ERYTHEMA <lgl>, INJECTION_SITE_PRURITUS <lgl>, ## # INJECTION_SITE_SWELLING <lgl>, INJECTION_SITE_PAIN <lgl>, ARTHRALGIA <lgl>, ## # DYSPNOEA <lgl>, VOMITING <lgl>, PRURITUS <lgl>, DEATH <lgl>, RASH <lgl>, ## # ASTHENIA <lgl>
| SEX | n |
|---|---|
| F | 24070 |
| M | 8514 |
| NA | 828 |
| VAX_MANU | n |
|---|---|
| JANSSEN | 1106 |
| MODERNA | 16253 |
| PFIZER-BIONTECH | 16053 |
Include code?
## # A tibble: 7 x 6 ## term estimate std.error statistic p.value odds_ratio ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -9.34 0.161 -58.0 0 0.0000876 ## 2 SEXM 0.924 0.0573 16.1 2.18e-58 2.52 ## 3 AGE_YRS 0.0915 0.00207 44.2 0 1.10 ## 4 HAS_ALLERGIESY -0.100 0.0608 -1.65 9.82e- 2 0.904 ## 5 HAS_ILLNESSY 1.10 0.0664 16.6 6.60e-62 3.01 ## 6 HAS_COVIDY -0.117 0.148 -0.791 4.29e- 1 0.890 ## 7 HAD_COVIDY 0.00915 0.193 0.0474 9.62e- 1 1.01
Include code?
## # A tibble: 20 x 6 ## term estimate std.error statistic p.value odds_ratio ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -2.01 0.0287 -70.1 0 0.134 ## 2 HEADACHETRUE -1.67 0.156 -10.7 7.92e-27 0.188 ## 3 PYREXIATRUE -0.429 0.112 -3.82 1.34e- 4 0.651 ## 4 CHILLSTRUE -1.21 0.171 -7.11 1.17e-12 0.298 ## 5 FATIGUETRUE -0.367 0.115 -3.19 1.41e- 3 0.693 ## 6 PAINTRUE -0.913 0.153 -5.98 2.17e- 9 0.401 ## 7 NAUSEATRUE -0.621 0.139 -4.46 8.17e- 6 0.538 ## 8 DIZZINESSTRUE -2.17 0.193 -11.2 2.87e-29 0.114 ## 9 PAIN_IN_EXTREMITYTRUE -1.43 0.194 -7.38 1.56e-13 0.239 ## 10 MYALGIATRUE -1.57 0.264 -5.94 2.91e- 9 0.209 ## 11 INJECTION_SITE_PAINTRUE -1.37 0.248 -5.54 2.95e- 8 0.253 ## 12 INJECTION_SITE_ERYTHEMATRUE -15.1 186. -0.0811 9.35e- 1 0.000000285 ## 13 ARTHRALGIATRUE -1.68 0.338 -4.97 6.73e- 7 0.186 ## 14 DYSPNOEATRUE 0.509 0.0845 6.02 1.73e- 9 1.66 ## 15 VOMITINGTRUE 0.677 0.135 5.02 5.06e- 7 1.97 ## 16 PRURITUSTRUE -3.67 0.579 -6.33 2.39e-10 0.0256 ## 17 INJECTION_SITE_SWELLINGTRUE -14.4 206. -0.0696 9.44e- 1 0.000000580 ## 18 RASHTRUE -2.62 0.356 -7.36 1.91e-13 0.0728 ## 19 ASTHENIATRUE 0.442 0.122 3.62 2.94e- 4 1.56 ## 20 INJECTION_SITE_PRURITUSTRUE -14.5 227. -0.0639 9.49e- 1 0.000000509
Include code?
## # A tibble: 20 x 9 ## SYMPTOM estimate std.error statistic p.value conf.low conf.high odds_ratio ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 HEADACHE -0.170 0.0954 -1.79 7.42e- 2 -0.361 0.0133 0.843 ## 2 PYREXIA 0.0734 0.0967 0.760 4.48e- 1 -0.120 0.259 1.08 ## 3 CHILLS -0.0727 0.103 -0.703 4.82e- 1 -0.280 0.126 0.930 ## 4 FATIGUE 0.0226 0.102 0.221 8.25e- 1 -0.183 0.219 1.02 ## 5 PAIN 0.0190 0.106 0.179 8.58e- 1 -0.194 0.222 1.02 ## 6 NAUSEA -0.0574 0.116 -0.495 6.21e- 1 -0.291 0.164 0.944 ## 7 DIZZINESS -0.187 0.132 -1.42 1.57e- 1 -0.455 0.0633 0.829 ## 8 PAIN_IN_… -0.0720 0.133 -0.541 5.89e- 1 -0.342 0.180 0.931 ## 9 MYALGIA -0.167 0.143 -1.17 2.42e- 1 -0.458 0.102 0.846 ## 10 INJECTIO… 0.0938 0.131 0.718 4.73e- 1 -0.171 0.342 1.10 ## 11 INJECTIO… 0.0935 0.144 0.647 5.17e- 1 -0.201 0.366 1.10 ## 12 ARTHRALG… 0.228 0.141 1.62 1.06e- 1 -0.0589 0.495 1.26 ## 13 DYSPNOEA 0.325 0.137 2.38 1.75e- 2 0.0470 0.584 1.38 ## 14 VOMITING 0.140 0.159 0.880 3.79e- 1 -0.186 0.439 1.15 ## 15 PRURITUS 0.0229 0.166 0.138 8.90e- 1 -0.319 0.335 1.02 ## 16 INJECTIO… -0.364 0.201 -1.81 7.01e- 2 -0.784 0.00822 0.695 ## 17 DEATH 0.801 0.120 6.69 2.28e-11 0.559 1.03 2.23 ## 18 RASH -0.0669 0.180 -0.372 7.10e- 1 -0.439 0.269 0.935 ## 19 ASTHENIA 0.478 0.148 3.24 1.21e- 3 0.177 0.757 1.61 ## 20 INJECTIO… 0.424 0.156 2.71 6.75e- 3 0.103 0.718 1.53 ## # … with 1 more variable: identified_as <chr>
Important verbs and tools used:
Important verbs and tools used: